While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
translated by 谷歌翻译
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for clustering, classification, and regression problems.
translated by 谷歌翻译
在这项工作中,我们介绍了一种学习模型,旨在满足计算资源有限的应用需求,并且优先考虑鲁棒性和解释性。学习问题可以作为受限的随机优化问题提出,其限制主要源于模型假设,这些假设定义了复杂性和性能之间的权衡。这种权衡与噪声和对抗性攻击的过度拟合,概括能力和鲁棒性密切相关,并取决于模型的结构和复杂性以及所使用的优化方法的属性。我们基于退火优化开发了一种基于在线原型的学习算法,该算法被称为无线梯度随机近似算法。学习模型可以被视为一种可解释的竞争学习神经网络模型,用于监督,无监督和强化学习。该算法的退火性质有助于最小的高参数调整要求,局部最小值预防差以及相对于初始条件的鲁棒性。同时,它通过直观的分叉现象逐渐提高学习模型的复杂性,从而在线控制对性能复杂性权衡。最后,随机近似的使用能够通过动态系统和控制的数学工具来研究学习算法的收敛性,并允许其与增强学习算法的集成,从而构建适应性的状态行动聚合方案。
translated by 谷歌翻译
我们认为了解生物或人为群的协调运动的问题。在这方面,我们提出了一种学习计划,以估计相互作用者的协调规律与群体密度随时间的观察。我们根据划线斑块植绒模型的成对交互来描述群体的动态,并表达群体的密度演进作为对平均流体动力方程系统的解决方案。我们提出了一种新的参数族,以模拟成对交互,这允许积分微分方程的平均场宏观系统被有效地解决为PDE的增强系统。最后,我们在迭代优化方案中纳入了增强系统,以了解与群体的密度进化的观察中相互作用的动态。这项工作的结果可以提供一种替代方法来研究动物群坐标,为大型网络系统创造新的控制方案,并作为防止对抗逆情机制攻击的防御机制的中心部分。
translated by 谷歌翻译
人类认知中的普遍学习架构存在是由神经科学的实验结果支持的广泛传播的猜想。虽然没有指定低级实施,但据信人类感知和学习的摘要概述需要三个基本属性:(a)基于内存的知识表示,(b)基于内存的知识表示,(c)逐步学习和知识压实。我们从系统理论上探讨了这种学习架构的设计,开发了具有三个主要组件的闭环系统:(i)多分辨率分析预处理器,(ii)组不变特征提取器,以及(iii)基于渐进知识的学习模块。多分辨率反馈循环用于学习,即使系统参数适应在线观察。设计(i)和(ii),我们建立在基于小波的多分辨率分析和集团卷积运营商的属性的建立理论上。关于(iii),我们介绍了一种新的学习算法,该算法构建多项分辨率的逐步增长的知识表示。该算法是基于退火优化的在线确定性退火(ODA)算法的延伸,使用无梯度随机近似求解。 ODA具有固有的鲁棒性和正常化属性,并提供了逐步提高学习模型的复杂性的方法,即根据需要,通过直观的分叉现象,神经元的数量。所提出的多分辨率方法是分层,逐步,知识的和可解释的。我们说明了在最先进的学习算法和深度学习方法的上下文中所提出的架构的性质。
translated by 谷歌翻译
几乎在几乎每个迭代机器学习算法的内部是超参数调谐的问题,包括三个主要设计参数:(a)模型的复杂性,例如神经网络中的神经元数,(b)初始条件,这大量影响算法的行为,(c)用于量化其性能的不相似度量。我们介绍基于在线的基于原型的学习算法,可以被视为用于分类和聚类的逐步增长的竞争学习神经网络架构。所提出的方法的学习规则被制定为一种在线梯度随机近似算法,解决了模拟退火过程的适当定义的优化问题的序列。该算法的退火性质有助于避免众多局部最小值,提供初始条件的鲁棒性,并通过直观的分叉现象提供逐渐增加学习模型的复杂性的方法。提出的方法是可解释的,需要最小的超参数调整,并允许在线控制对性能复杂性权衡。最后,我们表明Bregman分歧自然地看作是一个在学习算法的性能和计算复杂性中起着核心作用的一个不相似性措施。
translated by 谷歌翻译
Given a large graph with few node labels, how can we (a) identify the mixed network-effect of the graph and (b) predict the unknown labels accurately and efficiently? This work proposes Network Effect Analysis (NEA) and UltraProp, which are based on two insights: (a) the network-effect (NE) insight: a graph can exhibit not only one of homophily and heterophily, but also both or none in a label-wise manner, and (b) the neighbor-differentiation (ND) insight: neighbors have different degrees of influence on the target node based on the strength of connections. NEA provides a statistical test to check whether a graph exhibits network-effect or not, and surprisingly discovers the absence of NE in many real-world graphs known to have heterophily. UltraProp solves the node classification problem with notable advantages: (a) Accurate, thanks to the network-effect (NE) and neighbor-differentiation (ND) insights; (b) Explainable, precisely estimating the compatibility matrix; (c) Scalable, being linear with the input size and handling graphs with millions of nodes; and (d) Principled, with closed-form formula and theoretical guarantee. Applied on eight real-world graph datasets, UltraProp outperforms top competitors in terms of accuracy and run time, requiring only stock CPU servers. On a large real-world graph with 1.6M nodes and 22.3M edges, UltraProp achieves more than 9 times speedup (12 minutes vs. 2 hours) compared to most competitors.
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting.
translated by 谷歌翻译
As aerial robots are tasked to navigate environments of increased complexity, embedding collision tolerance in their design becomes important. In this survey we review the current state-of-the-art within the niche field of collision-tolerant micro aerial vehicles and present different design approaches identified in the literature, as well as methods that have focused on autonomy functionalities that exploit collision resilience. Subsequently, we discuss the relevance to biological systems and provide our view on key directions of future fruitful research.
translated by 谷歌翻译